TO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 



1 



High-Rate Vector Quantization for the 
Neyman-Pearson Detection of 



Correlated 



Processes 



Jeffrey Villard, Student Member, IEEE, and Pascal Bianchi, Member, IEEE 



This paper investigates the effect of quantization on the performance of the Neyman-Pearson test. It 
is assumed that a sensing unit observes samples of a correlated stationary ergodic multivariate process. 
Each sample is passed through an A/'-point quantizer and transmitted to a decision device which performs 
a binary hypothesis test. For any false alarm level, it is shown that the miss probability of the Neyman- 
Pearson test converges to zero exponentially as the number of samples tends to infinity, assuming that the 
observed process satisfies certain mixing conditions. The main contribution of this paper is to provide a 
compact closed-form expression of the error exponent in the high-rate regime i.e., when the number N 
of quantization levels tends to infinity, generalizing previous results of Gupta and Hero to the case of 
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I. Introduction 

Consider a sensing unit which transmits a sequence of measurements to a decision device (DD) 
whose mission is to detect a given signal. For example, a CCTV camera in a surveillance system 
transmits its data to a remote controller interested in the detection of a particular object in its 
field of view. This situation also arises in the context of wireless sensor networks (WSN) where 
a fusion center collects the individual measurements of a large number of identical sensors and 
processes these measurements in order to detect abnormal events [[T|, [|2|. In such applications, 
due to bandwidth, delay or storage limitations, transmitted data rates are often limited. Therefore, 
measurements must be quantized prior to transmission. As a matter of fact, this quantization step 
may severely degrade the overall detection performance of the system. 

In this paper, we consider that a binary hypothesis test is performed at the DD. The available 
data set corresponds to a quantized version of a stationary ergodic discrete-time multivariate 
process. Our aim is to quantify the detection performance of a given quantizer and characterize 
quantization strategies which guarantee attractive performance at the DD. 

In the past decades, numerous papers were dedicated to the search for relevant quantization 
strategies and their practical design [[3|. The most popular criterion used to select quantizers is the 
mean square error (MSB) between the quantized signal and the initial source [|4|. An analytical 
characterization of quantizers minimizing the MSB is difficult in the general case. Bennett [[5| 
pioneered the study of high-rate (or high-resolution) quantization for the reconstruction of 
scalar signals. The idea of Bennett was to study the MSB in the asymptotic regime where 
the number of quantization levels tends to infinity. A closed form expression of the (properly 
normalized) MSB can be determined in that case, and the families of quantizers minimizing the 
asymptotic MSB can be directly characterized. Bxtension of the work of Bennett to vector- valued 
observations was later achieved in Q. However, the MSB criterion is especially relevant when 
the aim is to reconstruct the source. On the other hand, it can be inappropriate as far as other 
applications are concerned. For this reason, various distortion measures have been proposed in 
the literature in a task-oriented setting for estimation, classification and detection [|7|-[18|. In 
particular, considerable attention has been paid to optimal quantization for hypothesis testing. 
Poor and Thomas 1 12 1 used Ali-Silvey distances between densities. Later, Poor p3| proposed the 
generalized /-divergence and studied this distortion measure in the high-rate regime. Picinbono 
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and Duvaut [14| considered a deflection criterion and proved that the corresponding optimal 
procedure corresponds to the scalar quantization of the likelihood ratio. Tsitsiklis [15| studied 
the properties of such quantizers with respect to several distortion measures. More recently, 



following the initial works of Tenney and Sandell p6| and Tsitsiklis p7| , Gupta and Hero ^8J 
investigated the selection of high-rate quantizers for binary hypothesis tests. In their setting, the 
decision device gathers a sequence of n independent and identically distributed (i.i.d.) variables, 
each of these variables being passed through a fixed quantizer. The probability density function 
(pdf) of the samples is assumed to be known both under the null hypothesis and the alternative. 
In this case, it is well known that a uniformly most powerful test is obtained by the Neyman- 
Pearson (NP) procedure which consists in rejecting the null hypothesis when the log-likelihood 



ratio (LLR) exceeds a certain threshold [19|. The threshold is usually chosen in such a way 
that the probability of false alarm of the test (that is, the probability to decide the alternative 
under the null hypothesis) is fixed to a specified level, say a. The performance of the NP test of 
level a can be evaluated in terms of the miss probability (that is, the probability to decide the 
null hypothesis under the alternative). In our case, the miss probability clearly depends on the 
quantizer used by the sensing unit. Thus, a natural approach would be to select the quantizer 
which minimizes the miss probability. Unfortunately, the miss probability does not admit any 
tractable expression as a function of the quantizer. To circumvent this issue, it is convenient to 
study the miss probability in the case where the number n of available snapshots tends to infinity. 
In case of i.i.d. observations, the celebrated Stein's lemma [20] states that the miss probability 
tends to zero exponentially in n. Based on this result, it is relevant to select the quantizers which 
yield a large value of the error exponent. Unfortunately, the maximization of the error exponent 
as a function of the quantizer is impractical. Following the idea of [|5|, Gupta and Hero 
restrict their attention to high-rate quantizers and manage to obtain a compact expression of the 
error exponent loss induced by quantization. 

Most of these works address the case where observations are independent random variables. 



However, the detection of a correlated process is a crucial issue in many applications [21|-[24|. 
In this case, fewer results are available in the literature. Chamberland and Veeravalli fTl\ analyze 
the impact of the density of sensors in a WSN on the detection performance, when observations 
are correlated. Willett et al. [22] study the one-bit quantization of a pair of dependent Gaussian 



random variables. In case of the detection of a Gauss-Markov signal in noise. Sung et al. [23 1 
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prove that for a fixed false alarm level, the miss probability of the NP test converges exponentially 
to zero, and provide a closed form expression of the error exponent. Hachem et al. p4| later 



extended the results of [23 1 to irregularly sampled Gaussian diffusion processes. However, [|23|, 
p4| assume that the DD has a perfect access to the observations of the sensing unit, and do not 
address quantization issues. 

In this paper, we study the performance of the Neyman-Pearson test based on a quantized 
version of a stationary ergodic multivariate process. We generalize the work of Gupta and 



Hero [18 1 to the case where the observed process is non-i.i.d. (either under the null hypothesis, 
the alternative, or both). In this situation. Stein's lemma does not directly apply. The error 
exponent does no longer admit a closed-form expression and the determination of relevant 
quantizers is therefore a more difficult task. Provided that the process of interest satisfies 
certain forgetting properties (present observations should become nearly independent of past 
observations after a sufficient amount of time), we prove that the miss probability of the NP 
test of level a tends exponentially to zero as the number of observations tends to infinity. Our 
main contribution is to provide a compact closed form expression of the error exponent in 
case of high-rate quantizers. If N denotes the number of quantization levels (or equivalently 
if each measurement is quantized on log2(A^) bits), we prove that the error exponent achieved 
when using quantized observations converges as tends to infinity to the ideal error exponent 
that one would obtain if perfect/unquantized measurements were available at the DD. More 
precisely, we prove that the error exponent loss tends to zero at speed A^^^/"^ where d represents 
the dimension of each individual measurement. The asymptotic error exponent depends on the 
process distributions under both hypotheses. It also depends on the quantization strategy through 
the so-called model point density and model covariation profile. The model point density can be 
interpreted as the asymptotic density of cells in the neighborhood of each point of the observation 
space. The model covariation profile captures the shape of the cells. As a consequence, the 
selection of relevant high-rate quantizers reduces to the determination of the point densities and 
covariation profiles minimizing the asymptotic error exponent loss. In case of scalar quantization 
{d = 1), our compact expression immediately yields a simple characterization of optimal high- 
rate quantizers. In case of vector quantization {d > 2), an exact characterization of optimal 



quantizers is more difficult. Following the approach of [ 18 1 once again, we nevertheless determine 



relevant families of quantizers with attractive error exponent. Note that our theoretical results 
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hold under the assumption that the observed process "forgets" past observations fast enough. As 
a special case, we prove that our assumptions hold for a general class of hidden Markov models 
verifying a certain contraction property. Numerical illustrations are provided in the case where 
the measurements correspond to a modulated signal in the In-phase/Quadrature plane. 

The paper is organized as follows. In Section |II} we describe the observation model. We 
also review some known results on Neyman-Pearson tests and we derive the associated error 
exponent in the ideal case where the DD has perfect access to the measurements. The vector 



quantization framework is introduced in Section III In Section IV the impact of quantization 
on the error exponent is evaluated in the high-rate regime. We determine relevant quantization 
strategies allowing to reduce this degradation. Section |V] is devoted to the proof of the main 
result. In Section |VI| we illustrate our findings in the special case of hidden Markov processes 
and give sufficient conditions on the transition and observation kernels ensuring that our results 



apply. Section VII is dedicated to numerical illustrations. 



Notation 

For any sequence {yi)iez, for any integers k < i, notation i/k-i stands for the collection 
{yk,yk+i, ■ ■ ■ ,yi) and notation y^ is used to designate the whole sequence. If y is a vector with 
dimension d, we denote by y^'^^ its i-th component and \\y\\ its Euclidean norm. We denote by 
the spectral norm of any square matrix A. Notation J stands for the transpose operator. 



3 



A real- valued function / : y^-i ^ fiyu-i) on S C M x ■ ■ ■ x is said to be of class C; 
on S if it is three times continuously differentiable on S. We denote by Vy^f{yk-i) its gradient 
w.r.t. ym at point yk-i. When no variable is specified, Vg{y) simply denotes the (rf-dimensional) 
gradient of the real-valued single-variable function y i— )► g(y) defined on Y C W^. We define the 
Hessian matrix of f by TV? „ f 1 . = — for alH, ? G |1, . . . , dj. Moreover, notation 
Stands for „ . 

Notation -B(X) stands for the Borel cr-field on X. Notation (j{Yi.,n) stands for the sub-cr-field 
of B(Y^), associated with the random vector Fi.„. Notation — stands for the convergence 

in probability as n — )■ cxd. Notation y stands for the convergence in the L''-norm w.r.t. 

n— >oo 

probability Pq. 

Notation o stands for the composition operator i.e., for any arbitrary functions / and g, 
f o g(x) = f{g{x)). Notation on{-) is a little-o notation as N tends to infinity. 
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II. Neyman-Pearson Detection with Perfect Observations 
A. Observation Model 

Consider two probability measures Pq and Pi on a relevant probability space. Denote by 
(^fc)feez a stationary ergodic process for both Pq and Pi, taking its values in a bounded convex 
subset Y of M*^. We associate an hypothesis (HO and HI respectively) to each of the two 
probability measures Pq and Pi and investigate the problem of the detection of HI vs. HO 
based on a set of n observations Fi:„ = (Yi, . . . , 

For each i E {0, 1}, we assume that Pj is the probability distribution of the coordinate process 
0^k)k&z on the canonical space (Y^, i?(Y^)). We denote by Pj^„ the restriction of Pj to a(Yi,n)- 
We denote by Eq and Ei the expectations associated with Pq and Pi respectively. We introduce 
the reference measure jj, which coincides with the d-dimensional Lebesgue measure restricted 
to Y. 

Assumption 1: The following properties hold true for each i E {0, 1}. 

1) For each n > 1, Pi^n admits a density pi w.r.t. /x®". 

2) Pi{yi:n) > for each yi,^ E Y". 

3) Eo\logpi{Yo)\ < oo. 

The density Pi of Pj „ depends of course on n, but we drop the index n to simplify the notation. 
For each i E {0, 1}, we also define pi{yn\yi:n-i) = Piiyi-.n) /Piiyi-.n-i) with the convention that 
Pi{yn\yi:n-i) = Pi{yn) whcu u = 1 (that is, whcu yi;„-i is a void vector). Assumption [T]-2) 
implies that both distributions Po,n and Pi „ are absolutely continuous w.r.t. each other. 



B. Likelihood Ratio Test 

We now investigate the detection of HI vs. HO based on the perfect observation of n mea- 
surements Yi,n. The log-likelihood ratio (LLR) writes: 

Ln = log—— . (1) 

Po{yi:n) 

The NP test rejects the null hypothesis when L„ is larger than a threshold, say 7. For each 
a E (0, 1), we define the miss probability of the NP test of level a by: 

/3„(a) = infPi[L„ <7] , 



May 2011 



DRAFT 



TO APPEAR IN THE IEEE TRANSACTIONS ON INFORMATION THEORY 



7 



where the infimum is w.r.t. all 7 such that the probability of false alarm does not exceed a i.e., 

7 S.t. Po [Ln > 7] < « • 

For each n > 1 and each a E (0, 1), due to the celebrated Neyman-Pearson's lemma, /3„(a) 
is the lowest achievable miss probability among all binary tests of level a which are based on 
the observation of Quantity /3„(a) is therefore a key metric in order to characterize the 
performance of the hypothesis test. Unfortunately, it usually does not admit any tractable closed- 
form expression. In the sequel, we study the asymptotic behaviour of /3n(a) as the number of 
observations n tends to infinity. In this regime, it can be shown that, under certain assumptions, 

iSnia) ^ exp{-n K) (2) 

for some constant K given below, which we shall refer to as the error exponent. 

C. Error Exponent with Perfect Observations 

The evaluation of the error exponent K in Equation ([2]) fundamentally relies on the following 
lemma: 

Lemma 1 ( H^Sjj): Assume that a binary test is performed on a sequence Yi;n = (^i, • • • , Yn) 
of n observed random variables. Denote by pQ and pi the density of Yi-n under HO and HI 
respectively (w.r.t. any common reference measure). Assume that under HO, 

1 , Po{Yl:n) P ^ 

— log z y K 

n pl{Yi.,n) 

for some deterministic constant k such that < k < 00. Then, for any a E (0, 1) the miss 
probability /3n(a) of the Neyman-Pearson test of level a is such that 

lim — log/3„(Q;) = —n . 

n->oo n 

Lemma [1] implies that the error exponent, if it exists, coincides with the limit in probability 
(under Pq) of — (l/n)L„, where L„ is the LLR defined by ([T]). The existence of the error exponent 
is directly obtained from the following assumption, which will be discussed later on. 

Assumption 2: For each i E {0,1}, (logpi(yo|^-m:-i))m>o is a convergent sequence in 

We are now in position to study the limit of the LLR L„ and prove the following result, which 
provides the general form of the error exponent. 
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Theorem 1: Under Assumptions [T] and [2[ 



lim — log f3n{a) = —K 

n—>-co n 



where K is the constant defined by 



K = Urn Eo 

m—^oo 



\og?^{Yo\Y^rn:-,] 
Pi 



(3) 



Proof: Using the chain rule, we first write under the form: 

n 

L„ = -Viog^(n|ri.fc_i) . 

Denote by T the limit in -L^(Po) of sequence (log ^(yo|^-m:-i))m>o- The main point is the 
study of the difference log ^(Yfc|Fi:fc_i) — To 9'', where 9 is the shift operatoiQ We can write: 



En 



n n ^ — ^ 



k=l 



(a) 
< 



< 



1 " 



\og'^{Yk\Y,.,k.i)-To9'' 
Pi 



k=l 



log^(Fo|V'-fc+l:-i: 
Pi 



T 



where step (a) comes from the triangular inequality and step (6) is a consequence of the 
stationarity of process {Yk)kez under Pq. The right-hand side of the above inequality can be 
interpreted as a Cesaro mean and thus converges to zero by definition of T. We thus write: 



n 

-L„ = -VTo0^ + £„ 

n n ^ — ^ 



k=l 



where e„ represents a term which converges in probability (under Pq) to zero as n — oo. As 
Po is stationary ergodic, we conclude using the ergodic theorem that —{\/n)Ln converges in 
probability to Eo(T) under Pq. This result together with Lemma [T] proves Theorem [TJ ■ 
Remark 1: Let us make some remarks on the above Assumptions [T] and |2j Assumption [T] is an 
extension of those made by Gupta and Hero [18, Section III, pp. 1956]. Assumption |2] does not 
appear in [ 18| since it is obviously verified by i.i.d. processes. In this case, Theorem[T]is known 



Recall that we are considering probability measures defined on the canonical space Y^. For any a; G Y*, we may write cj = 
(. . . , uj-i,uJo,uJi, . . .). The fcth-time shifted version of cj is then given by 6''uj = (. . . , LJk-i,<-0k,'-0k+i, . . . ). Notation T o 9'' 
represents the measurable function T o9^{uj) = T{6'^uj) = T((. . . , ojk-i,uJk,^k+i, •■•))• Recall that process Yz is defined as 
the coordinate process i.e., Yn(Ld) = LOn for each n. As a consequence, the measurable function log ^{Yk\Yi:k-i) — Tod'' at 
point CO is equal to the measurable function log ^ (yo|y_fc+i:-i) — T at point O'^co. 
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as Stein's lemma. Assumption [2] is trivially satisfied by short-dependent (m-dependent) processes 
such as moving average processes for instance [26]. In this case, the present observation Yq is 
independent of past observations F_m_i, yim-2, • • • as soon as m is large enough. As explained 
in Section VI Assumption |2] is as well satisfied by a wide class of hidden Markov models. 



Remark 2: In order that (logpo(^o|^-m:-i))m>o is a convergent sequence in L^(Po), it is 
sufficient to check that (Eq logpo(^o|^-m:-i))m>o is a bounded sequence. This claim is a con- 
sequence of Moy p7| (see Theorem 4 therein). In practical situations, this remark provides us 
with a convenient way to check whether Assumption [2] is verified for i = 0. On the other hand, 
the validation of Assumption |2] for i = 1 generally requires more efforts in practice: One should 
be able to prove that (log]9i(Fo|^-m:-i))m>o is a Cauchy sequence in L^(Po). 

Remark 3: When Pi is a finite-order Markovian measure. Assumption [2] can be simply reduced 
to the assumption that sequence (Eq log ^(Fo|^-m:-i))m>o is bounded. Indeed, due to Moy [j27|, 
this hypothesis directly implies the convergence of sequence (log ^(lo|^-m:-i))m>o in L^(^o) 
and thus yields Theorem [1] 

III. Quantization 

A. Definitions 

Consider a fixed integer N >2. An A^-point quantizer is a triplet {Cn,'^n,^n) where Cn = 
{Ctv,!, • • • ,Cn,n} is a set of cells (Borel sets of Y with non-zero volume) which form a 
partition of Y, where = {^n,i, ■ ■ ■ ,^n,n} is an arbitrary set of distinct elements and where 
: Y — > Hat is a function s.t. ^Af(y) = ^nj whenever y E Cnj- For each N, k, we introduce 

ZN,k = ^N{yk) , 

the quantized measurement on log2(A^) bits. We assume that the quantizer (Cat, E^, ^n) is known 
at the decision device. The aim is to decide between hypotheses HO and HI based on the 
observation of Z^^i-^n- 

Note that in our model, each individual measurement is quantized based on the same quan- 
tization rule as in the traditional framework of vector-quantization Q. It is also relevant in the 
case of WSN when samples are collected using identical sensors. 
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B. Error Exponent 

Assume that the number of quantization levels is fixed. For a given number n of quantized 
observations, we define the LLR based on quantized measurements by: 

r , Pl,N {ZN,l:n) 

Ln,N = log T^- r , 

PO,N [^N,l:n) 

where for each i E {0, 1} and for any set of quantization points ^Njvn = i^Nji, ■ ■ ■ , ^nj,,) ^ 

Pi,Ni^N,n,J = Pi,n{,CN,n X • • • X C'Afjn) 

is the probability that measurements Fi, . . . , F„ respectively fall into the cells C^j^, • • • , Cnj^ 
associated with the observed points ^nji, ■ ■ ■ ,^N,j„ {n.b. function pi^^ depends on n, but we 
omit the index n to simplify notation). We define similarly: 

^ (C \C \ - Pi,N{^N,jun) 
Pi,N[C.N,jn\^N,ji..„-i) — 77 7 • 

Pi,N[C.N,jl:n-l) 

For each a E (0, 1), we denote by /3„,Ar(a) the miss probability of the NP test of level a 
when quantization is applied i.e., the infimum of Pi [Ln,N < l] w.r.t. all 7 s.t. Pq [Ln,N > 7] < 
The error exponent associated with /3n,Af(tt) is provided by the following result, whose proof is 
similar to the one of Theorem [H 

Corollary 1: Consider a fixed > 2. If Assumption[T]holds and if (logpj Ar(Zjvo|^Af,-m:-i))m>o 
is a convergent sequence in L^(Pq) for each i E {0, 1} then, 

lim - log/3n,Af(a) = -Kn , 

n— !>oo n 



where Kj^ is the constant defined by: 



Kn = lim Eq 

m—>oo 



, PO,N I ry \ry 

iOg \ZjN,Q\^N.-ra:-\, 

Pl,N 



(4) 



The above result provides the error exponent Kj^ associated with the NP test on quantized 
observations. A natural question is: How does the choice of the quantizer affect the error 
exponent? Unfortunately, the expression of the error exponent does not immediately allow 
to evaluate the impact of the quantizer. In the sequel, we thus follow the approach of [6], 



1 18 1 and focus on the case where the order A^ of the quantizer tends to infinity. We refer to 
such quantizers as high-rate quantizers. This approach leads to a convenient and informative 
asymptotic expression of Kj^. In particular, it will be shown that, under some assumptions on 
the process {Yk)k(^z and the quantizers sequence (Cat, Sat, ^Ar)7v>i, the above error exponent 
converges to A^ as A^ tends to infinity. 
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IV. Performance of High-Rate Vector Quantizers 

A. Notation and Assumptions 

For each A^, we remark that the error exponent does not depend on the particular choice of 
the quantization alphabet Sat For the sake of simplicity, we assume with no loss of generality 
thaO 

i.e. each coincides with the centroid of cell Cnj- We respectively define the volume and the 
diameter of cell j by V^j = J^^ dy and d^j = ^^Pu,veCN j 11^ ~ ^11- introduce the specific 
point density and the specific covariation profile as the piecewise constant functions on 
Y respectively defined as follows, for any y E Cnj (j G {1, . . . , A^}): 

1 



CN{y) = Cnj 



M^iy) = Mn,j = j {y-iN,,){y-iN,ydy . 

Now consider a family of quantizers (Cat, H^r, ^jv)Ar>i- We make the following assumption. 
Assumption 3: The following properties hold true. 

1) As — 7- oo, Cn converges uniformly to a continuous function ( such that infygy Civ) > . 

2) As — 7- oo, Miv converges uniformly to a continuous (matrix- valued) function M such 
that sup gY \\M{y)\\<oo . 

C 

3) There exists a constant Cd such that, for all A^, sup^ d^j < j^yd ' 

We will refer to C as the model point density of the family {Cn,'Ei^,^n)n>i- It represents the 
fraction of cells in the neighborhood of a given point y. Function M will be referred to as the 
model covariation profile. For each y eY, M{y) is a non-negative dx d matrix. In the literature, 
function y i— > Tr(M(y)) is usually referred to as the inertial profile [|3|, ||6|, [18|. Function M 
provides information about the shape of the cells. 

^The value of the log-likelihood ratio (and a fortiori the value of the error exponent) remains unchanged by any one-to-one 
transformation of the quantized observations. Otherwise stated, the particular definition of the quantization alphabet has no 
impact on the corresponding Neyman-Pearson test provided that the latter quantization alphabet is composed by A'^ distinct 
elements. 

^The ith component of ^jvj is defined as ^^•'^ = ^ J^^ y'*' dyj / ^ J^^ dy^ . 
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Intuitively, high-rate quantizers should be constructed in such a way that ({y) is large at those 
points y for which a fine quantization is essential to discriminate the two hypotheses. Theorem |2] 
below provides a more rigorous formulation of this intuition. 

Remark 4: Assumption [3] is essentially the same as the one traditionally made in the high-rate 
quantization framework [j3j, [|6|, [[T8|. The main difference lies in Assumption [3]-3): Usually, the 
volume of each cell vanishes at speed while the diameter tends to zero. Our assumption 
introduces a constraint on the speed of convergence of the sequence of diameters {d^j}, which 
ensures that cells shrink at the same speed (l/A^^/"^) on each dimension. Assumption [s] is for 
instance valid for sequence of quantizers constructed as companders ||3|, [|5|. Such quantizers 
write as the composition of an invertible function (the so-called compressor) and a uniform 
quantizer. Since [5], it is known that any scalar quantizer can be written as a compander. Under 
mild conditions on the compressor, it can be shown that any sequence of companders with 
a given fixed compressor satisfies Assumption [3] (in this case, the model point density ( is 
fully determined by the first order derivative of the compressor). This point is discussed in 
Section HV-a 

B. Error Exponent in the High-Rate Regime 

Before stating the main result, we need further assumptions. For each m > and each 
i G {0, 1}, define: 

r]i{m) = sup Eo |logpi(Fo|>^-m:-i) - logPi(lo|F„„/;„i)| , (5) 

m'>m 

Vi,Nim) = sup Eo |logPi,7v(ZAr,o|^7V-m:-l) - logPi,Ar(^Ar,o|^7V-m':-l)| 
m'>m 

Note that we already assumed in Theorem[T]and Corollary [T] that sequences logpj(yo|^-m;-i) and 
\ogpi^NiZN,o\ZN,-m:-i) couvcrgc iu L-'^(Po) as m — )■ oo, meaning that ■r]i{m) and ?7j^iv(m) tend 
to zero. Now coefficients ?7i(m) and rii^j^{m) characterize the speed at which logpj(yo|^-m;-i) 
and logpi,Ar(ZAr o|^Af,-m:-i) couvcrgc to their limits. They are therefore related to the mixing 
property of processes Yi and Z^^i (this point is discussed below in Remark [8]). In the sequel, 
we will need to ensure that these limits are reached fast enough (see Assumption |4]-3) below). 

Assumption 4: The following properties hold true. 

1) For any n > 1, yv.n ^ vAvv.-n) is of class C3 on Y". 
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2) sup|„ 



>l,?/i:neY", l<k/,r<n, l<h,lij<d} 



9^ log Pi 



< OO . 



3) There exist two constants Cg, e > such that for each i G {0, 1}, > 2 and m > 0, 

(J 

max{r]i{m),r]i^N{m)) < ^-^ ^ ■ 

4) For each i E {0, 1}, each integers m, m', k such that —m' < —m < < k: 

Eo II V,o \ogPi(Yo:k\Y^m;-l) - V,o log (Fo:fc | >^-m':-l 
Eq log Pi(Yk\Y_m:k~l)\\ <1pk , 

where Yl,kVk and Ylik'^k are convergent series. 
Assumption |4] will be discussed in details at the end of the present subsection. Particular 



(6) 

(7) 
(8) 



examples of processes satisfying the above assumption are provided in Section VI and in the 



numerical results of Section VII We are now in position to state our main result. Recall that 
Po(?/) is the pdf of Yq under Pq. Recall that K and are the error exponents associated with 
the NP test in the absence and in the presence of quantization respectively, given by (|3]) and (|4]). 
Note that Assumption [4]-3) implies that both sequences rii{m) and r]i^]\f{m) tend to zero. This 
guarantees that under Assumption [T] the conclusions of Theorem [T] and Corollary [T] hold true 
i.e., error exponents K and do exist. 

Theorem 2: Under Assumptions [1] [3| |4} the following statement holds true: 
As tends to infinity, N'^^'^{K — K^) converges to a constant De given by 

1 fp^{y)F{y) 



dy 



(9) 



2 J ({y^M 
where function F is given by 

F{y) = Eo W(Fo) ^(>z) \Yo = y] , (10) 

and random variable £(Yz) is the limit in L^(Po) of sequence [Vyg log ^{Y^k-.k) ) 

V Pi / fc>o 

The proof of Theorem [2] is given in Section |Vj 

Theorem [2] states that when the order of the quantizer tends to infinity, the error exponent 
associated with the NP test converges at speed N^'^/'^ to the error exponent K that one would 
have obtained in the absence of quantization. Loosely speaking, if /3„,Ar(a) represents the miss 
probability of the NP test of level a, the approximation 



n( K- 



(11) 
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holds when both the number n of sensors and the order of quantization are large. Quantity D^, 
represents the (normalized) loss in error exponent between the quantized and the unquantized 
cases, in the high-rate quantization regime. 

Note that Equation ([9]) resembles to Bennett's formula [5, Equation (1.6)] and its vector 
extension for rth-power distortion [|6| Equation (7)]. 

Remark 5: As a first consequence of Theorem [2[ under some assumptions on the process, 
classical quantizers as those produced in an MSE perspective will lead to error exponent 
which converges to as tends to infinity, at speed N~'^/'^ (see Equation ( [TT] ) above). 

Remark 6: The particular situation where measurements {Yk)k>o are i.i.d. under both hypothe- 
ses was studied by Gupta and Hero [18]. In this case, function F{y) reduces to: 

F{y) = Vm'M{y)VA{y) , 

where A(?/) = log^^ is the single sample LLR. Then, expression (9) of De is consistent with 
the results of Gupta and Hero (see in particular p8| Equation (20)]). 

Note that we assume that each joint density Po(l/i:n) and pi{yi;n) is of class C3 on Y". 
Gupta and Hero's assumption is weaker, since they only assume that ''the densities are twice 



continuously differentiable on an open set of probability 1" [18 page 1956]. In fact, we need 
conditions on the third derivatives of the logarithm of the densities in order to find relevant upper 
bounds of the Taylor-Lagrange remainders in the expansion of the joint densities Pi{y-m:u) in 
the general case (see the detailed proof in Section |V]). 

Remark 7: We now provide some insights on the meaning of Assumption |4] and on the class 
of stationary processes which satisfy the latter. Assumptions |4]-1) and [4]- 2) are mild technical 
conditions on the smoothness of the pdf of the observations. They encompass a large family of 
stochastic processes and are generally simple to validate on a case-by-case basis. As explained 
above. Assumption |4]-3) can be interpreted as a condition on the speed at which past observations 
are forgotten. Quantities r]i{m) and r]i^iy(m) can be interpreted as conditional mixing coefficients 
associated with the unquantized and quantized processes (Yk)k and {Z^^kjk respectively (see 
Remark [8] below). Past observations must be forgotten at least at a polynomial speed faster 
than m^. Assumption |4[4) can be interpreted similarly as a forgetting property, which no 
longer involves the logarithm of the density of the observations, but its derivative. For instance. 
Assumption!?] is simple to verify in case of short-dependent processes (such as moving average 
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processes for instance) provided that the density of the observation is smooth enough. A similar 
remark holds for a wide class of Markov chains. In this case, Assumption |4] essentially reduces 
to a smoothness assumption on the density of the transition kernel. More generally, we prove 



in Section |VI] that Assumption |4] holds for a wide class of hidden Markov models: We provide 
sufficient conditions on the transition kernel such that Assumption|4]holds. See also the numerical 
results in Section IVTll 

Remark 8: It is worth making some remarks on the link between Assumption]?] and standard 



mixing conditions used in the hterature on mixing processes [26|, [28 1, [29|. The mixing property 
which is the closest to our setting is related to the notion of ^/'-mixing. For two cr-fields U and 



V, define the following coefficient [26|, [28 1: 



sup 

¥{U)>0,¥{V)>0 



1 



F{unv) 



P(f/) F{V) 



Recall that a stochastic process Yz is said to be ^-mixing when the sequence of ^/^-mixing 
coefficients ^/'(cr(V„_,_i), cr(F_oo:o)) converges to zero. The classical ^-mixing condition traduces 
the fact that, loosely speaking, current samples at time n tend to become independent of past 
samples Yq, ... as n increases. In our case, we need to ensure that current samples become 
independent of past ones conditionally to intermediate values Yi,n. Usual ^/'-mixing coefficient do 
not fully allow to grasp this property. In [30J , we introduced the following conditional ^-mixing 
coefficient for a-fields U, V and W: 

FAU n V\W) 

where the essential supremum is taken w.r.t. Pq and where we use the convention 0/0 = 1. The 
above coefficient can be interpreted as a measure of dependence (under Pj) between U and V 
conditionally to W. In particular, it coincides with the traditional ^-mixing coefficient ipiU.V) 
when W is taken to be the whole space BiY'^) and P = Pq. For each n > 1, we further define 
'ipiin) = i>i{a{Yn+i),(r{Y^oo:o)\cr{Yi.,n)) and i>i{0) = i>i{a{Yi),a{Y^oo:o)) when n = 0. There 
exists a close link between the above conditional mixing coefficients and the set of coefficients 
?7j(m) defined in In particular. Theorem 2 is valid when Assumption |4]-2) is replaced by the 
assumption that sequences 4>i{n) and i>i^N{n) = ^i{a{ZN,n+i),cr{ZN-oo:o)\(y{ZN,i:n)) converge 
to zero at speed n^^^. We refer to [30| for details. 
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The asymptotic loss in error exponent D^, depends on the quantizer through its model point 
density C, and its model covariation profile M . In the sequel, we study the values of these 
parameters which attenuate as much as possible the loss D^. 

C. Determination of Relevant High-Rate Quantizers: Scalar case (d = 1) 

We first address the case where measurements {Yk)k>o are real-valued. Assume without much 
loss of generality that each cell is connected (cells are intervals) i.e., the quantizer is regular [4|. 
In this case, a straightforward derivation leads to M^iy) = 1/12 for each y and each A^. 
Therefore, function F simplifies to: 



Eo 



1 

12 

— lim Eq 
12 



Yo = y 

— log— (y_fc:fc) 

oyo Pi 



y 



Using Holder's inequality on (|9]), it is straightforward to prove the following result. 

Corollary 2: Assume that d = 1 and that cells are intervals. The error exponent loss D^, is 
such that: 



1 



[po{y)F{y)f' dy 



(12) 



where equality holds in (12) when the model point density coincides with: 

. _ [Po{y)F{y)f' 
J [po{s)F{s)f' ds ■ 

The above corollary provides the optimal high-rate quantization rule for the initial hypothesis 



testing problem. Note that expression ( [T2| ) is quite similar to pTj Equation (15)] which gives 
"the minimum distortion resulting with optimum level spacing" in an MSE perspective. 

Remark 9: In practice, A^ -point scalar quantizer achieving a given model point density ( can 
be easily implemented by means of a compander. Recall that a compander is defined as the 
composition of an invertible continuous function (the so-called compressor) and a uniform 
quantizer [j3|, [[5|. To that end, it is sufficient to define the compressor as the primitive of ( 
on the observation space. For example, if Y is the segment [a,b] C M, define ^(x) = ({t)dt. 
Next the output of the compander is quantized using a uniform A^-point quantizer on the interval 
[0, 1]. Under the assumption that C is a Lipschitz function, it is straightforward to show that the 
resulting sequence of quantizers satisfies Assumption [3] i.e., that it achieves the model point 
density (. 
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D. Determination of Relevant High-Rate Quantizers: Vector case (d>2) 

In the vector case, the determination of optimal high-rate quantization rules implies the joint 
minimization of expression (|9]) w.r.t. both functions and M. Unfortunately, as remarked in (3\, 



1 32 1, it is not known what functions M are allowable as covariation profiles. The determination 
of the set of admissible couples {(, M) is an open problem, which is beyond the scope of this 
paper. 

However, when M is fixed, the point density ^ which minimizes can be easily expressed 
as a function of M . Once again, this is a consequence of Holder's inequality: 

d+2 

where equality is achieved when the point density coincides with: 

Civ) - , l^°fa)^fa)lf . (13) 

In other words, one can easily provide the optimal high-rate quantization rule for a given limiting 
covariation profile. Following the approach of [18], we study two special cases of covariation 
profile: 

1 ) Congruent cells with minimum moment of inertia: In this paragraph, we focus on congruent 
cells with minimum moment of inertia i.e., we assume that 

Wy e Y, M{y) = , (14) 

for some v > 0, where Id represents the d x d identity matrix. 

Recall that Gersho ||33l made the now widely accepted conjecture that when tends to infinity, 
most cells (i.e., all the cells except those which are close to the boundary of the considered 
domain) of a d-dimensional MSE-optimal quantizer become congruent to some tessellating d- 
dimensional polytope H^. In such a case, M(y) is independent of y. Furthermore, Zamir and 



Feder [34 Lemma 1] proved that the cells of the MSE-optimal lattice quantizers become "closer" 



to balls i.e., with minimum moment of inertia, as dimension d grows. 



For quantizers with covariation profile given by ([T4|), the optimal point density (13) becomes: 



d 
d+2 



\po(y)F(y)] 

ay) = ^ , (15) 

/ [po{s)F{s)] ds 
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where function F is defined by 

F{y) = Eo 



lim Eo 



Vyo log 



Pi 



k:k I 



2 






Y, = y 



(16) 



Design Algorithm: In practice, one would like to design an A^-point quantizer which 



point density approximately equals ( [T5| ) for some finite N . This can be achieved by means 
of well-established algorithms, the most popular of them being the Linde-Buzo-Gray (LBG) 
algorithm p5| . This algorithm is an iterative method which computes a Voronoi tessellation, 
and yields an MSE-optimal -point quantizer, from a training set of data of some pdf p{){y). 

An (A^-point) MSE-optimal quantizer for density po{y) minimizes Eq [||1o ~ 'CAf(^o)||^] • As 
the number of quantization points N tends to infinity, such a quantizer has the following model 
point density g, [|6|: 

/- ( \ Pojy)^ ni\ 
CmseW = - — —a — - • (17) 



/po(s)<*+2 ds 



Comparing Equations (15) and (17), we deduce that the proposed quantizer, whose model point 
density ( is given by Equation ([15]), can be obtained in practice by simply feeding the classical 
LBG algorithm with a training set of data of the following pdf: 

Po{y)F{y) 



q*iy) 



J po{s)F{s)ds 



Section VII provides numerical illustrations of this approach. 

2) Ellipsoidal cells: In order to yield some insights on the general shape of the cells, and 



following [ 18 1, we focus in this paragraph on ellipsoidal cells. This kind of cells can not partition 
the considered convex subset Y of M'^ but, for large dimension d, in analogy with the spherical 
cell approximation ||3|, [34|, [36|, we may assume that almost all cells of a given quantizer are 
close to ellipsoids. 

Such an ellipsoidal cell, in the neighborhood of point y writes C = {x : {x—yyR{y) (x—y) < 
1}, for some symmetric positive definite matrix R{y). The corresponding covariation profile 
writes M{y) = i/ \R{y)\^^'^ R{y)~^ [18|, [37 1, for some v > 0, and has an eigendecomposition 



M{y) = U{y)^y)U{yy 
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where = Diag and U{y) is an orthogonal matrix. Note that the (positive) 

eigenvalues (0i(y))ie{i,...,d} of M(y) capture the relative importance of the axes of the ellipsoid C, 
while the columns (Mi(?/))iG{i,...,d} of U{y) i.e., the eigenvectors of M(y), indicate their respective 
directions. 

In this paragraph, we assume that eigenvalues {(f)i)ie{i,...,d} are fixed, constant w.r.t. y and, 
without loss of generality, sorted in increasing order i.e., < 0i < 02 < ■ ■ ■ < 0d- We want 
to find the best orthogonal matrix U{y) i.e., the one which minimizes function F{y), given at 
Equation ( [T0| ), in order to minimize the error exponent loss De (|9]). In other words, for a given 
"shape" of (non-degenerate) ellipsoid, we look for the best directions of its axes. Function F(y) 
writes: 



where L{y) = Eq [i{Y^) i{Yj^) 
definite matrix L(y) : 



Eo 



£(rz)"M(Fo)^(Vi) 



Yo = y 



TT{U{y)<^U{yyL{y)) 



(18) 



Now write the eigendecomposition of the positive 



L{y) = V{y)A{y)V{yy 



where A(y) = Diag (Ai:d(y)), i\iiy))ie{i,...,d} are the (positive) eigenvalues of L(y) sorted in 
increasing order i.e., < \i{y) < \2iy) < ■■■ < ^diy), and V{y) is an orthogonal matrix. 



Equation ( [18] ) thus writes: 

F{y) 



= Tr{U{y)<^U{yyV{y)A{y)V{yy 

d 

> "^(piXd-i+iiy) , 



i=l 



where the last inequality follows from a well-known trace inequality for positive semidefinite 



Hermitian matrices p8| , p9| Section 9-H]. The above lower bound is furthermore achieved 
choosing matrix U{y) such that U{yyV{y) is the anti-diagonal matrix with ones on its anti- 
diagonal i.e., defining the iih. column of matrix U{y) as the (rf — z + l)th column of matrix V{y), 
or equivalently eigenvector Ui{y) of matrix M{y) as eigenvector Vd-i+i{y) of matrix L{y). 



''For any given d-dimensional vector xi-.d G K'*, Diag {xi-d) represents the d-by-d diagonal matrix with diagonal coefficients 

{xi,X2, ■ ■ ■ ,Xd)- 
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From the above derivations, we conclude that if a cell is a non-degenerate ellipsoid around y 
then its axes should be aligned along the ones of matrix L{y) in the reverse order. In particular, 
its minor axis should be aligned along the principal eigenvector of matrix L{y). 

V. Proof of Theorem [2] 

A. Preliminaries 

Recall that Vn^ = J^^ dy is the volume of cell Cnj (j G {1, . . . , A^}). For each i e {0, 1} 
and each set of quantization points ^N,ji.,„ = i^Nji , • • • , ^N,j„) € 2^, define the following rescaled 
pdf of Zn^i,^: 

1 

Pi,N{^N,ji..J = 77 —77 Pi,Ni^N,ji..J 

VNji X ... X Vnj^ 

= V-/ P^ACN,n X ■ ■ ■ X Cn,J . (19) 

VN,n X • • • X VNJn 

The above definition is convenient because Pi,Ni^N,ji.,„) — Pii^Nji.,^) for large values of A^. This 
approximation will be of prime importance in the sequel. We define function Pi,Ar(^Arj,J^Ar,ji:„_i) 
similarly. 

For each i G {0, 1} and each integer i >0, we introduce the following functions: 

V y-i:o e Y^+\ Ci{y^e:o) = logpi{yo\y-i:-i) , 

Due to Assumptions [T]-3) and|4]-3) (which ensures that r]i{0) < 00), random sequence (£i(F_£:o)) 
lies in L^{Pq). Moreover, Assumption |4]-3) for large TTi ensures that sequence (i2j(yL£ o))^>o is a 
Cauchy sequence of L^(Po)- Denote by £j(F_oo:o) its limit. From Assumption [4]-3) once again, 
the following holds for any ^ > 0, 

Eo|£.(r_,:o) - A:(r-oo:0)| < (J^^. " (20) 

A similar result holds for sequence {Ci^N{ZN-E:o))e>o which converges in L^(Pq) to some 
random variable Ci^N {2^,^00:0) and verifies for any £ > 0, 

C 

Our aim is to study the difference K — Km between error exponents associated with the ideal 
and quantized cases respectively. We may write the difference as 

K-Kn = {Ko- Ko,n) - {Ki - K,,n) , (22) 
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where we defined for each i e {0, 1}, 

Ki = Eo [A(r-oo:o)] , 

In the sequel, we focus on the study of Ki — Ki^j^f, the study of Kq — Kq^^ being similar. 

We now proceed with the proof. Choose any e' such that < e' < . Define the sequence 

of integers m = m[N) = [A^i/(3f^)-<:'j ^]\^\ remember that with this definition, 

,3 



m 

lim —pT-r; = . 

The following decomposition holds true: Ki^]^ = Ki + + Un + ^at, where we defined: 

Tn = Eq [Ci^N{ZN,-m:o) — ^l{Z N,~m:o)] ! 
Un = Eo [Ci{ZN~rn:o) — 'Ci(F_m,;o)] , 

Sj\j = Eo [Ci^n{Zn_oo:o) — Ci^NiZj^_m:o)] + ^o [Ci{Y_m,Q) — £i(F_oo:o)] • 



(23) 



Using Equations ( [20] ) and (21 1, it is straightforward to show that 

ivr2/d 



[1 + m)6+^ 

By definition of m = m{N), we deduce that N'^^'^\5n\ converges to zero as — oo. As a 
consequence, the asymptotic analysis of quantity N'^^'^{Ki^n — Ki) reduces to the study of T/v 
and U]^. 

As Y is a bounded set. Assumption |4]-2) implies the following bounds on the derivatives of 
density pi which will be of permanent use in the sequel: 



sup ||Vj,,logpi(?/i:n)|| < Ci , 

{yi:neY", l<fc<n} 

\\^yk^^ZPl{yi:n)\\ < <^2 , 

{»/l:neY", l<fc<n} 



(24) 
(25) 



for some constants Ci and C2. 



B. Study of T/v 

We expand Tjy as follows 

Tiv = Eo 



, Pl,N{ZN-m:o) 

loK — 

Pl\^N-m:Q) 



En 



log 



Pi,n{Z, 



N,-m:-l 



Pi{Zn,- 



m:— 1 , 



(26) 
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We now study each term of the r.h.s. of the above equality. Consider u G {—1,0}. Writing 
the Taylor- Lagrange expansion of function y-m;u ^ Piiy^m-.u) at point ^nj^^.,^, using Assump- 
tions [3]-3), |4] and the properties of the quantizers sequence, we prove the following lemma (the 
detailed proof is given in Appendix [A] ). 

Lemma 2: For each j-rn;u G {1, . . . , iV}"+™+^, the following expansion holds true: 

where |eArj_„^„| < ct (fr/z)^ for some constant ct- 



J — m:u ' 



Plugging the above equation into (26), using log(l + x) —x\ < in a neighborhood of zero. 



Assumptions |3} |4]-2) and Equation ( [23] ), we obtain: 

Tm = T^(0) - T^(-l) + o^(iV-2/'^) , (27) 
where, for each u e {—1,0}, 



k=—m 



Tr ' 



(28) 



C. Study of Un 
We expand [/at as follows: 

[/at = Eo \\ogPi{ZN,~m:o) - log J9i (F_m;o)] - Eq [log Pi (Zat - logpi , (29) 

and study each term of the r.h.s. of the above equality. For each u G { — 1, 0} and each j-m-u £ 

{1, . . . , A^}«+™+\ we expand function t-> log pi at point ^Afj_„^„: 

logpi = logpi(^JVj_„.J + XI ^^fc logPl(^^J-,n:J^ - ^^jj 

A;=— m 

1 

+ 2 5Z " ^^J*)^ '^Iw logi^i(^iVj-^:J - ^^jJ + ^Niy-m-.u) ■ (30) 

Under Assumptions [3]-3) and|4]-2), for each G Cnj_^ x ■ ■ ■ x Cmj^, the remainder is such 

that ^ 

|e'^(l/-™:J|<(m + l)3c'3('^^ 
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for some constant Cg. By Equation ([23]), the r.h.s. of the above inequality converges to zero as 
tends to infinity faster than N^'^^'^. Plugging Taylor expansion ( [SO] ) into the expression ( |29l ) 
of Un, we obtain: 

Un = t/iv(0) - Um{-1) + ONiN-^'") , (31) 
where, for each m G {— 1, 0}, 

u 

-\ M'C^k-ZN,kyVl,y,\0gp,{ZN,_rn:u){Yt-Z^^,)\ . (32) 



k=—m 



k,£=—m 



The next step is to study each dominant term of the r.h.s. of p2) . The proof of the following 
lemma is provided in Appendix |B] 

Lemma 3: The following equality holds true for each u E { — 1,0}: 

where and 5 at are defined as follows: 

1 



k=—m 



Bn{u) 



1 

7^ E^c 



—m:i 

Tr I logpi(ZAr_m:„ 



MM{Yk 



k=—m 



CN{Yky/' 



V,fe logpo(^~ 



(33) 



Now we expand the term logpi as follows: 



^J>l{.y-m:u) _ Vy^Pl{y-m:u) Vy^Pljy -m:u) 
Pl{y-m:u) 



iPliy~m:u)y 



From the above decomposition and Equation ([28]), we can divide Bn{u) into two terms: 



Bn{u) 



k=—m 



Tr Vj,,, logpi(Ziv_„;„) Vj,^ logpi(Ziv_^;„) 



Expanding function logpi in the above equation and in ( [33] ), we can write dominant terms 
in a simple form /.e., replace each Zj^hy Y. Under Assumption [3] from Equations ( |25] ) and ( [23] ), 
we can easily prove that the corresponding remainders are om{N^'^/'^). 
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Putting all pieces together, we obtain 
1 " 



k=—m 



V,^^ logpi(y_m:„) , , , logPo(Y-n^:^J 



+ 



7^ 



k=—m 



Vy^ \0gPi{Y_m:uy y^^p^^ Vj,,. logPi(F_^;„) 



Tm{u) + on{N-^''') . (34) 



D. End of the Proof 



From the results of sections V-B and V-C, we can easily prove the following lemma. 
Lemma 4: The following holds true: 

-1 

N''I\K - K^) = Eo ['HN,o{Y^n.:o)] + 5^ Eo [nNAY-m:0) - "Hjv.fc )] + Oiv(l) , (35) 

fc=— m 



where for each u G { — 1,0}, each m > 1 and each k E {— m, . . . , u}: 

nN,kiX-m:u) = 7: Vy, log —{Y^m:u) . .^/d log — • 

•i^ Pi '■:,N{J-k) ' Pi 



(36) 



Proof: Recalling the decomposition: Ki^n = Ki + + Un + on{N '^I'^) and gathering 



Equations (27), (31), ([34]), it is straightforward to prove the following equality: 



k=—m 



1 ° 
k=—m 



CNiYky/" 



V7 1 NT ^N(Yk) t^r \ 

Vy, iogpi(y_^;o) ^^{Ykf/'^ l°gPH^-"^^o) 



+ E ^0 

fc=— m 



Vy^ log]9i(F_m:_i)^ /^^^n/. logPoCi^-m:-!) 



CNiY,f'' 



fc=— m 



V,, logpi(y_„.,i)' ^tM-, logPi(F_^:_i^ 



+ ojv(l) 



Similar expression holds for N'^^^^Kq^n — -R'o)-rcplace all pi by po in the above equation. 
Lemma |4] follows from decomposition ( |22l ). ■ 
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We now study the series ([35]). From Assumptions [3} |4]-2) and |4]-4), the following forgetting 
properties hold true for any positive integers £ and any integers k, u s.t. —i' < —i < k < u: 

Eo |^iV,fe(>^-^;«) - 'HAr,fe(y_^':„)| < C/,V2^_|fc|, (37) 

Eo\'HN,k{y~i:o)-'H-N,k{y~t-l)\ < Ch^\k\ , (38) 

for some constant Ch- 

It is clear from ( [37] ) that sequence {'HN,kiY-tu))) is a Cauchy sequence in L^(Po). We 



simply denote its limit by 'HN,kiY_ao:u)- Inequalities ( [37] ) and ( [38] ) provide the main tools for the 
asymptotic analysis of series ([35]). The proof of the following lemma is given in Appendix [C] 
Lemma 5: The following holds true: 

-1 

N^/\K - Kn) = Eo ['HNfi{y^oo:o)] + 5^ Eo I'HNAy^oo-.o) - -HNAy-oo-.-i)] + Oiv(l) . 

fc=— oo 

As process (Yfc)fcez is stationary, the expectation Eq enclosed in the sum of the above equation 
is invariant w.r.t. a time-shift. Using this remark, we obtain after algebra 



NVd^j^ - Kn) = lim Eo [?^^.o(>^-oo:fc)] + o^(l) • (39) 
For a fixed A; > 0, Equation ([7]) ensures that sequence ( V/o ^^g — (K_m:fc) ) is a Cauchy 

V / m>0 

sequence in L^(Po). Denote its limit by ikiY-oo-.k)- The upper bound of Equation ([8]) is uniform 
in m. Consequently, it also holds for sequence {ikiY-oo-.k)) k>o'- 

Eo ||4(l^-oo;fe) - 4-l(l^-oo;fc-l)|| < ^k • 

Under Assumption |4]-4), Yl,k'^k is a convergent series. Sequence (4(^-oo:fc))fc>o is thus a 

Cauchy sequence in L^(Po). Denote its limit by t{Yi)- Moreover, the upper bound of Equation ([7]) 

(resp. Equation ([8])) is uniform in m! (resp. m). It is then straightforward to prove that diYi) 

coincides with the L^(Po)-limit of sequence ( log — (F.^ifc) ) 

V / fc>o 

From Equation ([24]) and its counterpart for density pq, quantity log ^(Y^k-.k) is uniformly 



bounded. Consequently, the above limit also holds in the L (Po) -sense 



log ^ {Y.k:k) m) . (40) 



Plugging Equations ([36]) and ([40]) in Equation ( [39] ) and letting iV tend to oo complete the 
proof of Theorem [2] 
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VI. Illustration: Case of a Hidden Markov Process 

In this section, we translate our assumptions in the case of (discrete-time) hidden Markov 
models. For such models, they reduce to simpler conditions on the transition kernel of the 
underlying Markov chain, and on the observation kernel. This context, where the measurements 
are noisy samples of a certain Markov source, has raised a deep interest in the recent literature 



on sensor networks (see p3| , p4| and reference therein). 

Consider a stationary Markov process {Xk)k>o taking its values in an arbitrary state space X, 
and playing the role of a source signal to be detected. For each i E {0,1} and each integer t, we 
assume that the (iterated) transition kernel Pj [X^+t ^ ■ \ X^ = x] admits a density x' t-^ qj{x, x') 
w.r.t. some probability measure A on (X, B(X)). Assume that there exist an integer m, and two 
real numbers a^, s.t., for each i G {0, 1} and each (x,x') E X^, < < q^{x,x') < a^. 
In particular, this assumption implies that the Markov chain {Xk)kez has bounded support. 

If the state space X is finite, the above conditions hold if the Markov chain (Xfc)^^^ is 
irreducible aperiodic, choosing A as the (normalized) counting measure on X. In this case, the 
chain indeed admits a stationary distribution, and x') > for each x, x' and some integer 



m ||40J Section 8]. 

The states Xk of the above Markov source are supposed to be hidden. However, a "noisy" 
version (E Y C R.^) of X^ is available at the A;th sensor. We assume that the distribution 
P[Yfc £ • l-^fc = x] does not depend on the hypothesis HO or HI, and admits a density y i— i- g{x, y) 
w.r.t. the d-dimensional Lebesgue measure /i restricted to Y, such that < infx^y g{x,y) < 
sup^ yg{x,y) < oo. We furthermore assume that this density verifies some smoothness condi- 
tions: For each x E X, y g{x-,y) is of class C3 on Y, and supj^gx, j/ev, i</i,6j<d} 
dyW dyW dyir> ^) < sltuatlon is depicted in Figure |l] 

A similar assumption was recently introduced by [HTJ, [ [42| in order to study the asymptotic 
behaviour of the log-likelihood logpi(Fi:„) as n tends to infinity. In particular, it was shown 
that: 

|logpi(lo|>^-m:-l) -logPi(Fo|>^-m':-l)| < 



1 — cr^/(T+ 

for each m' > m > 0. This clearly proves that sequence logpj(lo|^-m:-i) converges in L^(Po) 
as m —7- 00 and yields Assumption |2] Moreover, the convergence holds at exponential speed, 
meaning that quantities r]i{ni), defined by Equation Q, vanish faster than The same claim 
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9 ^ 
V/<-i 



Xk 



Xk^ 



Qo/1 ^ Qo/1 ^ Qo/1 
Yk V,+i 
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ZN.k+^ i 



Fusion Center 
decision 



Figure 1: Detection of a discrete-time Markov process based on noisy observations. 



holds as well for quantities 77i,Ar(m), without need for any special condition on the quantizer 
(quantization preserves the hidden Markov nature of the original process (lfe)fc6z)- This yields 
Assumption |4]-3). 

Assumptions |4]-1) and |4]-2) are direct consequences of the above smoothness conditions on 
density g. Assumption |4]-4) can be derived following the arguments of |41|, [ |42| . 



The following proposition then follows from the results of [41 1, [42|. The proof is therefore 
omitted. 

Proposition 1: All conditions given by Assumptions [1] and |4] hold true for the particular 
process {Yk)kei. described in this section. 

As a consequence, if the family of quantizers moreover verifies Assumption [3} then the 
conclusions of Theorems [T] and |2] hold true. 



Section VII-A below provides a practical example of such a detection problem. 



VIL Numerical Results 

In this section, we provide numerical illustrations of the proposed quantization rule in terms 
of geometric properties and performance. Different contexts are considered and we compare 
several quantizers: 



The proposed quantizer, obtained using the approach described in Section IV-Dl and whose 



model point density is given by (15). 
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The MSE-optimal quantizer, which minimizes Eq ||lo — Z 



N,0 



and whose model point 



density is given by (17) 



• Gupta-Hero quantizer, introduced in [18|: In this case the model point density is drawn as 
if observations were i.i.d. i.e., only taking the marginal distributions po{y) and pi{y) into 
account. 

• The uniform quantizer with constant model point density. 



A. Scenario #1: Detection of Quaternary Modulations: QPSK vs. OQPSK 

In this section, we provide an example of hidden Markov models which verify the assumptions 



given at Section VT and detail how to use in this case the approach described in Section IV-Dl 
for the design of practical quantizers. 

1 ) Observation Model: We consider the following model for vector observations with dimen- 
sion d = 2: 

Yk = T{Xk) + Wk , (41) 

where {Xk)^^^ is a 2-bit message, which takes values in X = {0,1,2,3}, T{x) is the 2- 
D representation of state x in the I-Q plane^ according to Figure |2| and Wk CA/'(0,(7^) 
represents a zero mean circular Gaussian thermal noise with variance a^. Process {Xk)kez is 
i.i.d., uniformly distributed under HO, and forms a Markov chain under HI. More precisely, 

HO : Xk ■ Wjo, 1,2,3} 

HI : Xo ~ W{o,i,2,3}, fi[Xk+i=x'\Xk = x] = q{x,x') , 
where q is the transition matrix of the Markov chain and is given by: 

1/3 1/3 1/3 
1/3 1/3 1/3 
1/3 1/3 1/3 
1/3 1/3 1/3 

This situation arises when testing from noisy observations between two possible quaternary mod- 
ulations, namely quadrature phase-shift keying (QPSK) and offset quadrature phase-shift keying 
(OQPSK), in the In-phase/Quadrature plane [ [43| Chapter 3]. The corresponding constellations 
are depicted in Figure [2} 



r(0) = hi; -1], = [-1; 1], T(2) = [1; 1], T(3) = [1; -1], 
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In the observation model ( 141] ), densities have infinite support. We thus consider truncated 
observations on Y = [— M; M]^ for some positive real number M [44, Section 10.1]. The new 
(truncated) model is a hidden Markov model with observation density g{x,y) given by: 

g{x,y)=^^=^^^f^exp(^{y-T{x)y{y-T{x))) , (42) 

where 1^ stands for the indicator function of set A, and Cuip) is a constant such that Jyg{x, y)dy 
1, for each x G {0, 1, 2, 3} i.e., Cm{o-) = ^ J_|^^^ exp ^■ 



dt 



2(72 

The above hidden Markov model verifies the assumptions given at Section |VIl From Propo- 
sition [T| if the family of quantizers verifies Assumption |3} then the conclusions of Theorems [T] 
and |2] hold true. 

Note that the marginal pdf of the measurements (yfc)fc>o (represented in Figure [3]) writes 

1 ^ 

Po{y) =Pi{y) = ^^9{x,y) . (43) 

x=0 



Since it does not depend on the hypothesis, Gupta- Hero quantizer [18|, which minimizes the 
error exponent loss in case of i.i.d. observations, is not defined. 
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Figure 3: QPSK vs. OQPSK - Marginal pdf of the observations po{y) = pi{y) (M = 3, a = 0.6). 



2) Examples of Quantizers: Figure ^a) represents the MSE-optimal 128-cell quantizer ob- 
tained by the LBG algorithm, and setting M = 3, a = 0.6. Figure ^b) represents the corre- 
sponding proposed quantizer. Our quantizer is significantly different from the MSE-optimal one. 
Some low probability points turn out to be significant for the considered detection problem. 
Details on how we obtained these quantizers are given below. 

a) MSE-optimal quantizer: The MSE-optimal quantizer of Figure Qa) was obtained by 
feeding the LBG algorithm with 20 000 samples following distribution Pq i.e., i.i.d. with pdf 
Po{y) (see Figure |3|). 



b) Proposed quantizer: As noted in Section IV-D1[ the proposed quantizer, whose model 



point density ( is given by Equation ( [T5| ), can be obtained by simply feeding the LBG algorithm 
with observations corresponding with the following pdf: 

Po{y)F{y) 



J po{s)F{s) ds 



We simulated 20 000 samples of this pdf using rejection sampling [45, Section 2.2]. In practice. 



we approximated function F given by Equation (16) by: 



Fkiy) 



-I "A/C 

-E 



nMC 



V,olog^(F_fc.„i(j),y,yi:fc(j)) 
Pi 



(44) 



for A; = 3 and nuc = 1 000 replications {Y^ij))me{-k,...-i,i,...,k}d(^{i,-,nMc} 6 000 i.i.d. 
samples with pdf po. These values were chosen based on empirical observations. 
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The gradient in the above equation may be written as follows, after some derivations, and 



using Equations ( [42| ), ( [43] ): 

Vj,o log^(y_A::fc) = Vyologpo(2/o) - Vj,ologpi(?/_fc:fc) 



As they are finite sums on X or X^'^^^, the above four expectations are exactly computed at the 



El 




Vj) 


El 







time of the evaluation of (44). 



B. Scenario #2: Detection of an AR Structure in Gaussian 2-D Signals 

We consider the following model for vector observations with dimension d = 2: 

Yk = Xk + Wk, 

where Wk *~ CA/'(0, a^) represents a zero mean circular Gaussian thermal noise with variance 
cr^, and where {Xk)kez is a Gaussian process which is white under HO and correlated (AR-1) 
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under HI. More precisely, 

HO: Xfc • CAr(0, 1) 

HI: Xk = aXk-i + Vl^Uk , 

where a G (0, 1) is the correlation coefficient and Uk CA/'(0, 1) is the innovation process. In 
particular, {Yk)kez is a white Gaussian process under HO and is a hidden Markov process under 
HI, with the particular property that marginal distribution of single observations are identical 
under both hypotheses. 

We mention that in the above model, densities have infinite support so that the assumptions 
made in this paper are not satisfied (the observation set Y coincides with and is thus 
unbounded). In particular. Theorem |2] does not apply. Nevertheless, in order to yield some insights 
on the design of practical quantizers for detection, we can still use the approach described in 
Section |IV-D1| and compute the proposed model point density given by Equation ( [T5| ). 

Figure |5]; a) represents the MSE-optimal 64-cell quantizer obtained by the LBG algorithm (with 
a 20 000-sample training set of data), and setting a = 1. Figure [5];b) represents the corresponding 
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proposed quantizeij^ obtained when setting a = 0.8. Once again, our quantizer is significantly 
different from the MSE-optimal one. As a matter of fact, low probability points seem to be 
significant for the considered detection problem. 

Table |I] compares the latter two quantization rules and the uniform one (on the rectangle 
[—8; 8]^) in terms of quantity De (|9]). As expected, the proposed quantization rule leads to the 
lowest one. We can guess it will also lead to higher detection performance. 

Table I: Detection of an AR structure - Quantity for parameters values a = 0.8 and a = 1. 



Quantization rule 


Uniform on [—8 


8] 2 


MSE-optimal 


Proposed one 


Quantity De 


8.211 


2.255 


2.112 



C. Scenario #3: Detection of a Scalar MA Process in Noise 

Denote by Yk the samples collected by a receiver which makes a binary test associated with 
the following hypotheses: 

HO : Yk = Wk. 

L 

HI: Yk = Y,hiUu-i + Wk . 

where Wk A/'(0,cr^) represents a thermal noise which is supposed to be real- valued for 
the sake of illustration. Here, Uk represents a certain random source which is passed through a 
propagation channel with deterministic real coefficients ho, . . . ,hL, where L is an integer which 
represents the channel's memory. In the sequel, we set L = 3. Assume for instance that Uk is 
Gaussian distributed Uk *~ A/'(0, 1). We investigate the case where the sensing unit performs a 
scalar quantization of the received signal before transmission to the decision device. 



As in Section VII-B in the above model, densities have infinite support so that the assumptions 



made in this paper are not satisfied. Once again, in order to yield some insights on the design 



of practical quantizers for detection, we can still use the approach described in Section IV-Dl 
and compute the proposed model point density given by Equation ([T?]^ 



*In this case, we approximated function F (|T6j for finite k and exactly computed the involved expectation. 
^In this case, we approximated function F (|T6j for finite k and exactly computed the involved expectation. 
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Figure 6: Detection of an MA process - Probability and model point densities (h = 
[1.06677, -0.59281, 0.09565], a = 1.5). 



For the same reason, the result of Gupta and Hero [[T8| Equation (20)] does not apply, 
but we can compute the corresponding quantizer, which model point density is given by p8| 
Equation (25)], as they did for their Gaussian examples in fTS} Section V]. 

The performance depend on the noise variance and on the particular value of the channel. 
Thus, we assumed that channel coefficients Hq, . . . are i.i.d. Gaussian distributed with zero 
mean and unit variance, and made several simulations. 

Figure [6] represents the probability and model point densities for one channel realization i.e., 
h = [1.06677,-0.59281,0.09565], and setting a = 1.5. 

Considering a system with n = 80 sensors, constructing 4-cell quantizers for different methods, 
and computing the corresponding quantized probabihty distributions under each hypothesis, we 
can compare the considered quantization rules in terms of detection performance through their 
respective receiver operating characteristics (ROC curves). Figure |7] represents such curves for 
the above channel realization. The uniform quantizer is used on the support [— lOa, lOcr]. The 
whole curve is plotted using 50 000 samples of LLR under each hypothesis. 

The proposed quantization rule improves the detection performance compared to the MSE- 
optimal quantizer. In this example, the ROC curve is close to that obtained using Gupta-Hero 
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quantizer. Recall however that in other contexts (e.g. in Scenarios #1 and #2), Gupta-Hero 
quantizer may not even be defined. We must also qualify this observation: Our theoretical results 
are valid in the asymptotic regime where and n tend to infinity, that is, in the regime where 
the power of the test tends exponentially to one. In practice, the empirical validation of our 
result would thus require to simulate rare events. This topic is out of the scope of this paper. 

Note that if we interchange HO and HI, the proposed quantization rule will be different. 
This is due to the fact that the asymptotic regime we are interested in when dealing with error 
exponents i.e., n tends to infinity for a fixed type-I error a, restricts attention to one point along 
the Neyman-Pearson ROC curve. 




False positive rate (a) 

Figure 7: Detection of an MA process - ROC curves {h = [1.06677, -0.59281, 0.09565], a = 1.5, 
n = 80, N = 4, 100 000 samples). 

VIIL Conclusion 

We investigated the performance of the Neyman-Pearson detector used on quantized versions 
of a correlated vector-valued stationary process. It was shown that for a constant false alarm 
level, the miss probability of the test converges exponentially to zero. We determined the error 
exponent and we provided a compact and informative expression of the latter in the context of 
high-rate quantization. It is proved in particular that when the number N of quantization levels 
tends to infinity, the error exponent converges at speed N^"^^^ to the ideal error exponent that 
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one would obtain in the absence of quantization. In case of scalar quantization, we analytically 
characterized the high-rate quantizers minimizing the error exponent loss. In case of vector 
quantization, we proposed a method based on the LBG algorithm in order to construct practical 
quantizers with attractive performance. 

We believe that there are many directions for extending these results and mention a few 
here. In this paper, observations have absolutely continuous probability distributions w.r.t. the 



Lebesgue measure. Following Graf and Luschgy [46, Section 6] who considered measures with 
both continuous and singular parts, we could think of an extension of our work to such cases. 
We moreover focused on constant false-alarm rate (CFAR) tests. Following the arguments 



developed in p8| and using the results of p5] Section III], it could be interesting to study the 
whole asymptotic ROC curve and use a global performance criterion like the area under the 
curve (AUG). However, this would require a nontrivial extension of Sanov's theorem [ j47[ to 
non-i.i.d. times series. 

We furthermore think that the framework developed in this paper could be applied in the 
context of parameter estimation. The effect of quantization on performance, measured for instance 
by the Fisher information, could be studied and corresponding optimal vector quantizers could 
be described. 

Appendix A 
Proof of Lemma |2] 

We write the Taylor-Lagrange expansion of function y-m-.u ^ PiiV-m-.u) at point ^Nj-ru-.u- 

u 

Pl{y-m:u) = Pli^NJ.^J + ^VkPli^NJ-m-.uY iVk " ^TVjJ 

k=—m 

1 " 

+ 9 XI ~ ^^J^y Mffe,»Pi(^^,j-m:J iVi - ^N,jJ + eN{y-m:u) , (45) 



2 

k,l=—m. 



where 



k,£,r=-m h,fij=l 
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for a given 9 G [0, 1] (see [[48|). Plugging expansion ( [45] ) into ([19]) leads to: 



pi,N{i 



N,j- 



N,j- 



k= 
1 " 

k=- 



dyk 



V, 



N,jk 



( ^ M ^fcPl(^A'j-m:«) / . ^ dyk 



2 



Pl{^N,j-m:u] 



Plii 



N,j- 



{ye-i 



N,je) 



dyk dye 



(46) 



where 



:J — m:u 




ewiy-m-.u) dy 

— m:u 

Pli^NJ^ 



We now determine an estimate for this remainder term. For each y-m-.u ^ x ■ ■ ■ x C^j^ 



k,l,r=-m hfij=l 
1 



Pl{Oy-m:u + (1 - ^)e7Vj_„.J ^ ^ 9?/^^"^ 



+ (1 - 



^ Pl{dy-m:u + {1 - 0)CN,j_^J ^^^^ 

Pl{iN,j.m:u) 

First, we find a bound for the last factor. To that end, we expand function z_jn:u ^ logpi(-2-m:M) 
at point iN,j.^..^: 



N,jk) 5 



k=—m, 



for a given 6*' G [0, 1]. From Equation ( |24l ), the following inequality holds: 

Pi (^— m:u) 

k=—m 



log 



^ 5Z ll^f^ logPl(^'2:_„:n + (1 - ^')Ui-™:J|| Ikfc - ^N,jJ 
k=—m 
u 

<ClY^ \\Zk-iN,j^\ ■ 



k=~m 



Applying the above upper bound at point z^m-.u = dy^m-.u + (1 — d)^N,j.rn-^ using Assump- 
tion |3}3), we find 



log 



Pl{Oy-m:u + il-0)^N,j.m:u] 



Plii 



< Ci(m + 1) 
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for each y-m-.u ^ CN,j^,n x ■■■ x C^j^- According to the definition of sequence m{N) (see 
Equation ([23])), the r.h.s. of the above equation vanishes as tends to infinity. Consequently, the 

Pl(6»y_m:i, + (l-e)5iV,j_^.^) 



term 



2) gives the following upper bound: 



in Equation (47 1 is bounded. This result together with Assumption 



:J — m:u 



< Ct 



m + 1 



for some constant ct- 



Let us now examine the dominant terms of Equation ( [46] ). Recall that C,N,j is defined as the 
centroid of cell Cnj- 



N,j 



y 



dy 



It is straightforward to prove the following two equalities, for any j E {1, 
d-hy-d matrix A: 

dy 



N} and any 



I {y- i^.YA {y - e 

J Cm i 



dy 

Vj 



N,j 



Plugging above identities in Equation ( [46] ) and recalling that (^j = ^ prove Lemma 



Appendix B 
Proof of Lemma[3] 

We study each term of the r.h.s. of ( [32] ). Writing Taylor-Lagrange expansions of the probability 
densities and using the fact that quantization levels are centroids of the cells, we prove the 
following three lemmas. Define function Vn on Y by Viv(y) = Vnj whenever y E Cnj- 

Lemma 6: For each k E {—m, . . . , u}, the following equality holds true: 



En 



1 



Eo 



—m:u J 



M^iYk) Vy^PoiY-rnik-l, ZN,k, ^fc+liw) 



where |eAr,fc| < for some constant c'^. 
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Proof: We expand the expectation: 



En 



Vy^ \ogpi{ZN-m:uf{Yk - ZN,k) 



Vy, l0gpi(^^j_^^j^ {Vk - ^N,jJPoiy~m:u) dy-m;u ■ (48) 



u+m+1 



where ^ summation over all index vectors j-m:u G {1, . . . , A^} 

For each jk E {I, . . . , N}, we then consider the Taylor-Lagrange expansion of yk ^ Poiv-m-.u) 
at point ^Nj^: 

Po{y —m:u ) =Po{y —m:k—li ■.u ) 

+ ^yuPo{y-m:k-l, ^NJk^Vk+l-.uY {Vk " ^NjJ + (^N AV-m-.u) , (49) 

where 

(^N,k{y-m:u) = iVk - ^NJkY ^yi^PoiV-m-.k-l, %fc + (1 - 0)^N,j^, Vk+l-.u) {Vk " ^iVjJ 



for a given 6* G [0, 1]. Under Assumption |4]-2), from the counterparts of Equations ([24]), ([25]) for 
density pq and following the argument of Lemma [2] (see Appendix [A]), we can find a bound for 
this remainder: For each y-m-.u G Cnj^^ x ■ ■ ■ x Cnj^, 

\(^N,k{y-ni:u)\ < WVk - ^NJkf 1 1 Mf^Po (jZ-m:*;-! , %fc + (1 - Vk+V.u 



\yk-i 



V„^^^Po(l/-m:fc-l, + (1 - 0)^N,j„,yk+l:u) 



Po{y-m:k-l, Oyk + (1 - 6)^N,jk,yk+l:u) 

Po{y-m:k~l, Oyk + (1 - 9)^N,jk^yk+l:u) , ^ 
X ^ ^ Po[y-m:u) 



PoiV- 



<c\\yk- ^NJkf Po{y-m:u) 



(50) 



for some constant c. 



Plugging expansion ( [49] ) into ( 148] ) leads to two dominant terms D^ i and Dj\f 2 and a remain- 
der r^r: 



En 



V,^ logPi(ZAr_m:„)^ (Yfe - Z 



N,k 



Dna + Dn,2 + Tn ■ 
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We successively study each of them. The first dominant term is 



J — m:u 'J — m >ju, 

—m:k—li 

,jk ' Uk+l-.u 

)dy 

—m:u 



where {dyi}i^k stands for IliL-miT^fc '^Ui- The last equality holds true since we have chosen the 
quantization level to be the centroid of cell Cnj- 

The second dominant term is 



Dn,2 = J2 ■ ■■ ^^fc ^ogpiiCNj.m.uYiyk - CN,j,){yk - Cnj.Y 

J — m:u ^J — m u u, 

^ V/fcPo(2/-m:fc-l5 '^Afjfc) yk+l:u) dy^m:u 
X ^ykP^{y-m:k-l,^N,jk^yk+l:u) {dyi]i^k 

= E v,,iogpi(u._J'M^..v;ijr 

J — m:u 

,jk ' yk+l:u ) {dyi}i^k ■ (51) 

J J {CN,j^}i^k 

We now write this equality in a simple form. Obviously, under Assumption [T|-2), we can write 

V7 / c \ V/fePo(l/-m:fc-l,^Af,ifc,Z/fc+l:«) , . 

^ykPO\y-m:k-l,C,N,jk,yk+l:u) — 7 x Po[y-m:u) ■ 

Po[y—m:u) 

Note that the above expression is independent of yk G Cnj, so we can also write 

V7 / c \ f %kPo{y-m:k-l,^N,jk^yk+l:u) , n C?2/fc 
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Equation pT] ) thus becomes 

Dn,2 = Yl Vj„logpi(^^j_ 



^ ,,,,, V,.P0(l/-^:.-l,eA...,y.^l:.) ^^^^__^^^_^^^ 



jY2/d 



Eo 



— r?i:u 



—m:u J 



Po(X-m:u) 



where the last hne comes from (^j = 



N,3 



We complete the proof with a bound on the remainder term: 



E 



J — m:u 



Cjv,j_„x--xCiv,j„ 



(a) 



<CiC ... WVk - ^N{yk)\\ Po{y-m:u) dy^rn:u 



< Ci c 



where inequality (a) is obtained from Equations ( [24| ), ( [50| ) and (b) is a consequence of Assump- 
tion |3}3). 

Putting all pieces together proves Lemma [6j ■ 
Lemma 7: There exists a constant c'a such that, for each fc7^£G{— m,...,u}, 

C'o 



En 



(Ffc - ^iV,fc)^ V logPi(ZAr (Y^; - Z^f^^ 



< 



Proof: For each k ^ I, \n& expand the expectation: 



Eo [{Yk - ZM^kfVl^y^ \0gpi{ZN,-n,:u) {Y, - Z 



E 



Nl 



iVk - \,,y, \0gPi{^N,j^,^.J iVe - ^N,m) Poiy-m:u) dy^m:u 



Cjv,i_„ x---xC]vj„ 



(52) 



and consider the expansion of ^ PoiV-m-.u) at point ^nj^' 

Po{y-m:u) = Po{y-m:k-l, ^JVjfc , Z/fc+l:«) + e'^ , (53) 

where, from the counterpart of Equation (|24l) for density po and following the argument leading 



to Equation ([50]), \e'j^ f^{y-rn:u) \ < c' \\yk - ^njA Poiy-m-.u) for some constant c'. 
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Plugging expansion ( [53] ) into ( [52] ) leads to a dominant term and a remainder. The dominant 
term is 



E 



J~m:u 



X Po{y )dy 

—m:u 



E 



' / iVk- ^N,jk ) dyk ,y, log Pi (^7v,i_„:„ ) {ye - iNj, ) 



X Po{y-m:k-l,^N,jk^yk+l:u) 



. 



Using Equation ([25]) and Assumption |3]-3), we find a bound for the remainder term: 



E 



J — m:u 



ijjk - iN,jJ ^Lvi logPl(^JV,i_„:J {ye - iN,u) eN^k{y-rn:u) dy^ 



< Cod 



^ c' 



]Sfi/d J jys/d 



• (54) 



Lemma 8: For each k G {—m, . . . , u}, 



En 



{Yk - Z^^kY \. \0gPi{ZN^^rn:u) (^fc " Z 



1 



N,k 



]\f2/d 



E, 



Trivnogp,{Z 



N,—m:u) 



MN{Yk) \ Po{Y-m:k-l, ZN,k,yk+l:u) 



CN{YkYi^ 

Proof: For each k, we expand the expectation: 



where je'^^^l < ^ 



Eo [{Yk - ZN,kyVl logpi(Zjv,-™;„) {Yk - ZM,k)] 



N,k y 



{Vk - ^NJkY V„, \0gpi{^N,j^,^.J {Vk - ^NJk) Po{y~m:u) dy^m:u ■ 



Cjv,.7_^ x---xCjv . 



(55) 



Plugging expansion p3\ into ( [55] ) leads to a dominant term and a remainder. The study of the 
dominant term uses the same arguments as Lemma [6] The final expression comes from the 
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following equality: 



[ {y- UjYA (y - Uj) dy = Tr {A M^,) vi,f' 

J Cjv,-i 



for any d-hy-d matrix A, and the definition of the specific point density C^j = j^y^ ■ 

Equation ( |54| ) is also valid when k = i i.e., for the remainder considered here. This proves 

Lemma [8l ■ 
Gathering Equation ([32]) and Lemmas [6| |7| [8] results in 

MN{Yk) yy^:Po{Y-m■.k-l, ZN,k, Yk+l-.u) 



k=—m 



—m:u ! 



1 " 

k=—m 



Tr Vllogpi(Z 



Yk+ l:u) 



where |eAr(M)| < cu^tid for some constant cu. 

Expanding Vyf.po and po once again, under Assumptions |3] and |4]-2), it is straightforward to 



write the dominant term in a simple form i.e., replace each Zjv/j by Yk. From Equation ( [23] ), 
the remainder term is a little-o of N^'^/'^. This proves Lemma [sj 

Appendix C 
Proof of Lemma |5] 

Equation (|38]) ensures that the following series converges: 



= Eo [HiV,0(Vloo:0)] + 5^ Eo [H7V,fc(Vloo:0) " HiV,fc(Vloo;-l)] • 

fc=— oo 

Using Equation ([35]), the approximation of N'^/'^{K — Km) by series Sat leads to the following 
remainder: 

-m-l 

|iV2/^(ir-ir^)-S^| < E Eo|A;;V E Eo|T^V^^' (56) 

k=—m k=—oo 

where aJJ^ = nN,o{Y-,n:o) - y-NfiiY-oo-.o) and 

A^^ = 'HN,k{Y-m;o) — 'Hn ,kiY^rn;-l) — 'H N ,kiY- oo:q) + 'Hn ,kiY-oo;-l) (V < —1) , 

= •H^,fc(>^-oo:o) - -Hjv.fcl^^-oo:-!) (VA; < -m - 1) , 
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and where ej^ as N oo. Using the triangular inequality, we obtain for each k < —1: 

Eol A;J^| < Eo InNAY-m-.o) - nN,k{y-oo:0)\ + Eq |HjV,fc(l"-m:-l) " HAr,fc (Yloo:-! ) | • 



Using (37), this leads to: 



Eo|Aj^^| < 2ChiPm-\k\ ■ 

From the triangular inequality once again. 



Using ( [38] ), this leads to: 

After some algebra, there exists a constant ca such that: 

-1 -1 
^ EojAjJ^I < Ca ^ <^m-|fe| A V'ifci 

k=—m k=—m 

I -1 -Vmm 

< Ca 5Z 'frn-\k\ + 5Z ^I'^l 

\fc=— [m/2j k=—m 



oo 

< Ca I ^ fk+ 5Z 

yfc=rrra/2] A:=[m/2J 
<CATt\ 

Where (Ti™^) m>o is a sequence of positive numbers such that 7a™'' — J- as m — oo. The 
last line of the above equation holds true under Assumption |4]-4) since Yl Vk and Yl V'fc are 
convergent series. Similarly, Eo|A^''| < Chfm- 



The last series in (p6|) can be bounded using (38): 



—m—l —m—1 
k=—oo k=—oo 



for some constant cx and a given sequence (7^'"'')m>o such that T^'"'* — )■ as m — )■ oo. 
Putting all pieces together. Equation ( [561 ) leads to: 

\N^/'i(K - /r^) - Ejv| < ch^m + CA ri"^ + CT r^""^ + e> . 

The r.h.s. of the above inequality tends to zero as m, — > oo. This proves Lemma [5j 
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